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Abstract — Counting and classifying blood cells is an important diagnostic tool in medicine. 
Support Vector Machines arc increasingly popular and efficient and could replace artificial 
neural network systems. Here a method to classify blood cells is proposed using SVM. A 
set of statistics on images are implemented in C++. The MPEG-7 descriptors Scalable 
Color Descriptor, Color Structure Descriptor, Color Layout Descriptor and Homogeneous 
Texture Descriptor are extended in size and combined with textural features corresponding 
to textural properties perceived visually by humans. From a set of images of human blood 
cells these statistics arc collected. A SVM is implemented and trained to classify the cell 
images. The cell images come from a CcUaVision^^^ DM-96 machine which classify cells 
from images from microscopy. The output images and classification of the CellaVision^^ 
machine is taken as ground truth, a truth that is 90-95% correct. The problem is divided in 
two — the primary and the simplified. The primary problem is to classify the same classes 
as the CellaVision''''^ machine. The simplified problem is to differ between the five most 
common types of white blood cells. An encouraging result is achieved in both cases — error 
rates of 10.8% and 3.1% — considering that the SVM is misled by the errors in ground truth. 
Conclusion is that further investigation of performance is worthwhile. 



Klassificering av cellbilder med hjalp av MPEG-7-inspirerade 
matt och support vector machines i cellmorfologi 

Sammanfattning — Att rakna och klassificera blodceller ar ett viktigt diagnostiskt red- 
skap inom lakarvetenskapen. Support Vector Machines ar effektiva, okar i popularitet och 

kan crsatta artificiella neurala natvcrkssystcm. Har forcslas en metod for att klassificera blod- 
celler m.h.a. SVM. En mangd statistika pa bildcr implementeras i CH — h. Do s.k. MPEG-7 
descriptors Scalable Color Descriptor, Color Structure Descriptor, Color Layout Descriptor 
och Homogeneous Texture Descriptor utvidgas i storlek och kombineras med textur-matt 
motsvarande textur-egenskaper som uppfattas visuellt av manniskor. Fran en mangd bilder 
av manskliga blodceller samlas dessa matt. En SVM implementeras och tranas att klassifi- 
cera cellbilderna. Cellbilderna kommer fran en CellaVision^'^ DM-96 som klassificerar celler 
fran mikroskoperadc bildcr. Bildcrna och dess klasscr fran en CellaVision^'*^ DM-96-maskin 
tas som facit, ett facit som ar 90-95% korrekt. Problemet delas i tva — dct primdra och 
det fdrenklade. Det primara problemet ar att skilja mellaii de klasscr som CellaVision^'^s 
maskin gor. Det forenklade problemet ar att skilja mellan de fem vanligaste typerna av vita 
blodkroppar. Ett gladjande resultat uppnas i bada fallen — felfrekvenser om 10,8% och 3,1% 
— med tanke pa att SVM missleddes av felen i det tagna facitet. Slutsatsen ar att vidare 
studier angaende prestanda ar lonsamma. 
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Introduction 



After the introduction of MPEG-7 descriptors by the Movie Producers Expert 
Group (MPEG) committee 13| it is interesting to see how these features per- 
form in the field of machine learning. In this thesis a subset of them will be 
tested on the problem of classifying different cell types, i.e. cell morphology, 
by using Support Vector Machines. 

In medicine, more specifically the fields of hematology and infectious dis- 
eases, classifying different kinds of blood cells can be used as a tool in diagnosis 
— by counting certain cells' relative frequencies and comparing to what is nor- 
mal, conclusions can be made about possible diagnosis. 

Classifying cells using microscopy is used to classify infectious diseases by 
determining the relative amount of cells called neutrophils compared to the 
amount of cells called lymphocytes. Typical relative frequencies of the cells are 
found in table fTTT] Typical images of some common cells are found in figure [TTTl 

Another method used is flow cytometry where receptors on the cells are 
colored and the different types of cells are counted. Flow cytometry uses a 
complicated and expensive apparatus while microscopy is very cheap. 

However, microscopy is personnel intensive, many cells are hard to classify 
even for human experts, often several experts are needed to be certain. To 
be able to classify cells, great efforts of training are required, even more, to 
sustain competence, regular frequent work is required. This competence is im- 
possible to sustain at small clinics or in the countryside especially in developing 





Type 


Approx. Abundance 


neutrophil 


granulocytes 


70% 


eosinophil 


granulocytes 


1-6% 


basophil 


granulocytes 


0.01-0.3% 




lymphocyte 


20-40% 




monocytes 


3-8% 



Table 1.1: Abundance of different types of white blood cells (leukocytes) in 
healthy humans 
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(a) Neutrophil Granulo- 
cyte, segmented (class 
1) 




(d) Basophil Granulo- 
cyte (class 3) 



(b) Neutrophil Granulocyte, 
band (class 6) 



(e) Lymphocyte (class 
4) 




(c) Eosinophil Granulo- 
cyte (class 2) 




X7 

(f) Monocyte (class 5) 



Figure 1.1: Some typical images of common white blood cells 



countries. Instead, samples have to be sent to hematology labs. 

As processing power becomes cheaper and machine learning and computer 
vision algorithms grow better, machines can help less experienced personnel or 
give preliminary results while waiting for definite results. 

The problem this thesis try to investigate is how well these different types 
of white blood cells can be classified using a Support Vector Machine and a set 
of measures on the images, called features. 

There has been a lot of hype about Support Vector Machines since its in- 
troduction in the 1990's. SVM is applied within a broad range of fields, from 
bioinformatics[lli| to food engineering a, iris recogmtionH, texture classifi- 
cation and object recognition 25j. It is now one of standard tools available for 
machine learning — A recent search for "Support Vector Machine" (SVM) gave 
6 394 articles compared to 17893 for "Artificial Neural Network" (ANN) which 
has existed for much longer. That is why my supervisor and I chose to work 
with SVM. 

The SVM is trained with measures of the cell images, called features or 
descriptors. These are values that describe the essence of an image. In this 
thesis I will describe and implement a subset of the color and texture descriptors 
found in the MPEG-7 standard with minor variance. I chose to work with 
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MPEG-7 as a guide because of the MPEG committee's well known expertise. 

The MPEG committee developed e.g. the audio compression techniques 
used in MPEG-1 Layer 3 (MP3), the video compression used in e.g. DVDs 
(MPEG-4) and MPEG-7. The committee consists of experts from a broad 
range of areas that deal with digital information. 14 1 

MPEG-7 identify several descriptors which has proved useful in the Color 
and Texture Core Experiments^^ while developing of the standard. They have 
proved useful in image browsing, search and retrieval 2J] as well as in image 
classifLcation [19|. Color histogram based features has been successful both in 
image rctrieval'18] and imag e classificationji, 0, El systems. Texture features 
like Gabor Wavelet Filter Bank used in MPEG-7 has been successfully applied 
to iris 



15[ and facial expression |5j| recognition. 
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2.1 Support Vector Machines 

In this section I will briefly introduce Support Vector Machines from a the- 
oretical perspective. Further introduction may be found in Bishop's book0, 
chapters 6,7 and E]. If more substance is wanted I recommend reading the 
whole book by Christianini and Shawe-TaylorQ. The very thorough cover- 
age of the topic by its original implementor Vapnik in his book 20| , sometimes 
called the bible, was often an additional useful source for me. 



2.1.1 Supervised Learning 

Supervised learning is a kind of machine learning where the machine is fed with 
examples, i.e. instances of data tied to their class. The machine is told what 
class an instance belongs to. 

The task that a learning machine performs is to recognize an element x G A" 
as a member of a class — to classify it. These classes are called destination 
values and I use the notation y € y. In the binary case for example y = 
{ — 1, +1}. The task would then be to construct a function such that d{x, a.) = 
y, given a is the information the machine has previously gathered during the 
training process. During training, the machine observes a tuple of pairs 

5= ((xi,j/i),...,(x,,2;,)) C {XxyY, 

which is called the training set, and produces parameters a G M" deduced from 
this information, (si 



2.1.2 Linear Learning Machines 

Imagine the space X which has n dimensions. To be able to classify instances 
into the two classes labeled positive, y = +1, or negative, y = —1, a hyperplane, 
i.e. an affine subspace of dimension n — 1, must be found that separates the 
instances of the respective classes from each other. If such a hyperplane exists, 
the data is said to be linearly separable. 
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Imagine a two-dimensional coordinate system in which the instances are 
placed. If a straight line can be placed between the two classes of instances, 
the data is linearly separable. That straight line is a hyperplane of dimension 
1. The generalized hyperplane of dimension n — 1 is defined by the equation 

(w,x)+fe = 0. 

The normal vector w is orthogonal to the hyperplane and the bias b is the 
hyperplane's offset from the origin. 
Now consider the function 

n 

/(x) - (w,x> +& = ^m,a:, + 5 (2.1) 

i=l 

Where: x - instance 

w - coefficients learned 
b - system bias 

It will tell whether an instance is above or below the hyperplane. This is similar 
to linear regression in statistics. 

A decision function for the binary classification case then becomes 

d(x) = sgn(/(x)) 

J-I, a<0 
sgn(a) = < 

An example of an iterative algorithm that find the vector w from a set of 
X G A" is Rosenblatt's perceptron which was the first and simplest type of an 
Artificial Neural Networks (ANN). It is guaranteed to converge if the data is 
linearly separable. This criterion could also be written 

3wVi : 7i = j/j((w,Xj) + 6) > 0, 

i.e. all instances are classified correctly. The quantity 7,; is called the margin 
as it specifies how far from the hyperplane an instance is. If w and 6 are 
normalized, to and ||^, then the margin is called the geometric margin 
which measures the euclidean distances of the points x to the hyperplane. The 
closest point, the x^ with minimal 7^, define the margin of a hyperplane which 
is a stripe of empty space where no instances are. If the data is not linearly 
separable 3z : 7^ < 0.0,0] 

2.1.3 Maximum Margin Classifier 

The task of a maximum margin classifier is to maximize the margin which can 
be motivated, using statistical learning theory^ gives the least generalization 
error. 
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The maximum margin solution, the optimal w and b, is found by solving 

1 







llwll J 


> = arg max < 







■ mm 



y»((w,x,:) +b) 



To solve this first rescale w kw and b nb. The distance to the hyperplane 
is still the same min^ 7^ . Then set 

Ij =%((w,Xj) + 6=1 

for the point that is closest to the hyperplane. All points will then have 
7i > 1 and since the minimum 7^ = 1 all that have to be done is to maximize 
||w||^-'^ or minimize Hwp. The problem that is left is to 

IU»rl|2 

find arg min ■ 



w,b 2 (2.2) 
subject to 7i > 1, 

which is much easier. This problem is what is called a quadratic programming 
problem and can be solved using the theory of optimization theory and Lagrange 
Multipliers.^^ 

2.1.4 Optimization Theory 

The theory on Lagrangian multipliers states that to 

optimize /(x) 
subject to g{x) > 

one should optimize the Lagrangian function 

L(x, a) = /(x) + ag{x) 
subject to g(x) > 
a > 
ag{x.) = 0. 

These conditions are known as the Karush-Kuhn-Tucker(KKT) conditions. 
More generally, to add more constraints 5j(x), replace the ag{'x.) with a linear 
combination of all Lagrange multipliers aj and their corresponding functions 

J 

optimize ^(x, {aj}) = /(x) + ajgj{x) 

subject to Vj : gj{x) > 
a, >0 
ajffi(x) = 0. 
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In order to quickly find a solution to (|2.2p it can now be rewritten as the 
Lagrangian function 

1 ^ 

L(w, 5, a) = -||w|p - V a, (y,((w, x,> +b)-l). 

^ ^ ri ' 

/(X) 

The constraint function is negative because we are minimizing wrt ||w|| and b 
while maximizing wrt a. To finally arrive at what is called the dual represen- 
tation of the maximum margin problem the derivatives of L wrt to w and b, 
are set to 0. Maximizing this dual representation, 



W{a) ^ L{a.) = - - ^ ^ a,aj2/,2/j(xi, x^-), 

i—l i—1 j — 1 

by finding a, 

^ (2.3) 
subject to Vi : ai> 0, 

£ 

^ a^y^ = 0, 
1=1 

will construct the maximal margin classifier. [1. [sl. [2ol| 

The instances that have a corresponding > are called support vectors. 
That is because they lie on the margin. They are thus used in the decision 
function. 

Note how the input variables x^ are only used in an inner product which let 
the SVM avoid the curse of dimensionality caused by a data set with instances 
of too high dimension. 



2.1.5 The Kernel Trick 

The Kernel Trick is used implicitly in Support Vector Machines but it has also 
been tried out in e.g. RBF Networks, which is a kind of ANN.Q 

The inner product used in the dual optimization problem can be a linear 
one. Though it will not separate the instances fully when the dataset is not 
linearly separable, data must be mapped to another space where it is. 

A non-linear feature function (/)(x) can do such a mapping. However, there is 
no need to know the feature function explicitly, it is easier to define it implicitly 
via a Mercer Kernel. Q 

A complete, normed space with an inner product is called a Hilbert Space 
One of the beauties of Hilbert spaces lies in that any given function in the L2 
space could be approximated infinitely well in the ||-||2 and represented by an 
infinite linear combination of some coefficients and some basis functions. An 
example of this is the Fourier Series using Fourier coefficients and the Dirichlet 
Kernel Functions {e~*'^^}fc. 

A special kind of Hilbert spaces are the ones which arc called Reproducing 
Kernel Hilbert spaces. A function (xi,Xj) = i^(xi,Xj) — 0(xi)(/)(xj) is called 
a kernel when it satisfies the criteria in Mercer's Theorem. 
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A Mercer kernel K is defined as an inner product on elements of some space 
X.\^ An inner product is a function that is a positive-definite sesqui-linear^ 
form. In the R case this becomes a function 

{■,■): X X X 

such that 

K{:x.,z) = (x, z) = (z,x) = K{z,x.) (Symmetry) 

K{ax + by, cz) — abc(^K{x, z) + A'(y, z)) (Bilinearity) 

Vx : A:(x, x) > (Positivity) 

Ar(x, x) = -^==> X = (Definiteness) 

A Mercer kernel also have non-negative eigenvalues Xi of the Gram matrix G 
since it's defined as a Hermitian matrix 

Vi : Ai > 0|G (Positive semi-definite Gram matrix) 

Note that the elements of the space X do not need to be real vectors as they 
will be in this context, they could also be e.g. strings of symbols as well. As 
soon as a symmetric sesqui-linear positive-definite function could be defined on 
the elements of the space X, the space becomes an inner product space and the 
Support Vector Machine will do its job.Q] 

Here are some commonly used Mercer kernels defined on R" x R" [2^ : 

{x,y) Linear ~ x^Y (Linear, dot product, kernel) 

(x, y) Poly — (^x^y + 1^ (Complete Polynomial of degree d) 

{x,y)]iBF ~ exp ^^2^^^"''"^ "^^^^ (Gaussian, Radial Basis Function) 
(x,y) MLP — tanh(x^y + &) (Multilayer perceptron, for some &) 

the norm used in RBF is usually the euclidean distance, p —1 below 

- yliLp = - yi^) ^ (LP distance) 

i 

2.1.6 Gradient Ascent 

An easy approach to find coefficients ol is to update them in the direction of 
the gradient of the objective function VF(a), 

dW{OL) ' 

^ 1 - 2^ajyj(xi,Xj). 



^anti-linear in the second argument and linear in the first 
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To maximize the objective function W{a) one could just iterate 

, dW(a) 
a, ^ a^+i]— . 

Where: rj - the learning rate 

It is shown e.g. in Nello's book that setting 77 = j^^^ ^ ^ maximizes the gain 
if the ai G [Q,C],C E M. and that convergence is guaranteed if the hypcrplane 
exists. [81 



2.1.7 Multiclass SVM 

There are three major methods for training a set of classifiers to be able to 
classify several classesfiol, i.e. |3^| = fc > 2. 

In the one-against-the-rest method k binary classifiers are created where 
classifier i € [0, k) is told that all examples with class i are positive and the 
rest are negative. When predicting which class x belongs to all classifiers are 
tested and the one which gave the highest certainty wins. 

In the one- against- one method k(k — l)/2 binary classifiers are created such 
that all 2-combinations of classes i,i have a corresponding classifier. 

^ri\ n\ n(n — l)(7i — 2)! n(n — 1) 



~ \2J 2\{n-2)\ 2{n-2)\ 2 

The prediction is then done by voting, all binary classifiers vote on their re- 
spective class i or j. The class with the highest vote wins, this approach is 
called the " Max Wins" strategy. 

Direet Acyclic Graph SVM (DAGSVM) is the third method. It uses the 
same training method as one-against-one but a different decision mechanism. 
The classifiers are placed in a rooted DAG with the classifiers as internal nodes 
and the classes as leaves. Starting at the root a binary decision means move 
either left or right. When a leaf is reached the decision is done. [13] 



2.2 Features 



Features, or descriptors, try to take useful information out of an image — 
color distribution, measures on edges and texture properties. They capture 
information in a more condensed and efficient way than by just using the color 
values in each pixel. 

These descriptors are also scale invariant — it does not matter which size 
the images have. This is necessary as the images have different sizes. 

Scalable Color Descriptor, Color Structure Descriptor and Color Layout 
Descriptor are the three color descriptors that I describe below and that are 
implemented in the project. After the description of those come descriptions 
of two texture descriptors. One of them is similar to the Homogeneous Texture 
Descriptor from MPEG-7. Another set of descriptors, named Visual Texture 
Features, is from an article by Amadasum and King which describe computa- 
tional measures which approximate how humans perceive texture, [if 
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(a) Blackness 7r2^ 



(b) Blackness inl^ 



Figure 2.1: These images contain the same amount of black and would yield 
an identical color histogram but a different color structure descriptor. 



2.2.1 Scalable Color Descriptor 

The HSV space is uniformly quantized into a 3D histogram of 256 bins. Hue 
is divided into 16 levels, Saturation into 4 and Value into 4. In the MPEG-7 
specification the 16x4x4 = 256 bins are truncated to a 11-bit integer mapped 
to a non-linear 4-bit representation and then encoded using a Haar transform 
to drastically reduce space footprint. The scalability in this descriptor comes 
from the ability to choose how many Haar coefficients to store, see an article 
by Manjunath et al. for more details. [3] 

2.2.2 Color Structure Descriptor 

To express local color structure in an image this descriptor slides an 8 x 8- 
structuring element across the image counting in how many of these elements 
each color exists. By this technique one can differ between the images in 
figure O 

This descriptor is scale invariant as the structuring elements spatial extent 
scale with the image size. The structure element uses replacement sub-sampling 
if the image is larger than 256 x 256 pixels. If e.g. a 512 x 512 image is processed 
every other row and column will represent the image and the rest of the 2x2 
areas are thrown away. More generally 

p = max{0, round(0.5 \og^{WH) ~ 8)} (2.5) 
K = 2P, E = 8K (2.6) 

Where: E x E - spatial extent of the structuring element 
K - sub-sampling factor 

Each bin in the generated histogram represents the number of occasions a 
structuring element is found to contain the color associated with the bin. 

2.2.3 Color Layout Descriptor 

This is kind of a low-pass filter capturing spatial information. Again it is 
inspired by the MPEG-7 specification. The image is first divided in 8 x 8 blocks. 
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Figure 2.2: Zigzag scan order of a 5 x 5 matrix 



Then interpolation sub-sampling^ is applied, i.e. calculating the average color 
in each block, giving one representative color for each block. A 2D discrete 
cosine transform (DCT-II) is performed on the resulting 8x8 matrix. Low- 
frequency coefficients are selected using zigzag scanning order, see figure | 
In MPEG-7 the 6 first Y, the 3 first of U and V coefficients are extracted. 



2.2.4 Homogeneous Texture Descriptor 



Gabor wavelets have proved to be the best set of features compared to pyramid- 
structured wavelet transform (PWT), tree- structured wavelet transform (TWT) 
and multi-resolution simultaneous autoregressive model (MR-SAR) based de- 
scriptors. [l5| They are used in the MPEG-7 Homogeneous Texture Descriptor 
(HTD). 

Gabor wavelets are a family of modulated Gaussians, they form a complete 
basis set implying that, any given function /(•, •) can be expanded in terms of 
these basis functions. However, as they are not orthonormal, there is redundant 
information present in a set of coefficients. To decrease that redundancy I 
follow the strategy used by Manjunath et al., that is aligning the Gaussians 



such that their half-peaks meet like in figure 12.31 j24{ . To achieve this we first 
make a change of variables. The Gaussian is a Gaussian in both frequency and 
space domains. The width of the Gaussian in the frequency domain ((t„, <Ty) is 
inversely related to the Gaussian in the space domain (aa,, ay). In other words, 
the wider the Gaussian, the narrower its bandwidth. 21, 231 



2nau 



^the average of all pixels involved in the block represent the whole block as opposed to 
replacement sub-sampling where a single pixel represent the whole block 



2.2. Features 



13 




These parameters are needed for scaling 



a = {Uhi/Uio) 



1/(S-1) 



{a-l)Uu 
(a+ l)\/21n2' 



cr„ = tan 



Uhi-2\n2 



21n2- 



(21n2)(7i 
Uhi 



Where: Uio e K - lower center frequency of interest 
Uhi G R - upper center frequency of interest 

m E [0, S) C Z+ scale index 
S* C N - number of scales 
a > 1 e M - scale factor 

For different orientations the image needs to be rotated before filtering and 
scaling wrt a. 

x' = a~™{xcosO + y sin 9) 
y' = a~'^{-xsm6 + ycos6) 
e = rnr/K 

Where: n G [0, K) C Z+ - orientation index 

K - number of orientations 
9 & [0, tt) - orientation angle 
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The generated filter bank are matrices that should be convoluted with the 
image 

/' = / * G 
Where: * - the convolution operator 

See section 12.31 for details about 2D convolution. In figure 13.11 images of the 
Gabor wavelet filter bank kernels of different orientations are presented. 

In MPEG-7, rotation invariance is achieved in this descriptor, by rotating 
the features in the direction of the dominant direction. 



2.2.5 Visual Texture Features 

The features described in the article by Amadasun and King are implemented. 
These are features corresponding to properties of texture that humans can 
perceive. In the article measures of coarseness, contrast, busyness, complexity 
and strength are introduced and compared by rank with how humans sensed 
ten natural textures from the widely used Brodatz's album. I give here a very 
brief overview of the proposed measures. They all use a column vector called 
neighborhood gray-tone difference matrix (NGTDM).[l[ 



Neighborhood Gray- Tone Difference Matrix 

In a pixel p with coordinates (fc, I) neighborhood of size d, i.e. of the square 
surrounding a pixel, but without the center pixel the mean is calculated. 



— yi^/c, 



J2 51 "'~(^+"^'^+") ' 

_m——dn——d 

(m,n) ^ (0,0) 



(2.7) 



Where: W = (2d+ 1) 



The ith entry in the NGTDM is a sum of deviations from the mean of the 
center pixel, only concerning those pixels in the image which do not lie in the 
peripheral regions of width d. 

\i — Ap \ , there is a pixel with gray-tone i 
peN, (2.8) 
0, otherwise 

Where: Ni - the pixels with gray-tone i 
Gh - the largest gray-tone 

The relative frequency, i.e. the probability of occurrence, of different gray-tones 
is calculated as: [ij 



Pi = \Ni\/n, 

n = {width - 2d){height - 2d). (2.9) 
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Note that (|2.9[) allows a rectangular region of interest as opposed to the square 
regions used in the article by Amadasun and King, and that n replaces in 
the formulas. 

Coarseness 

Coarseness is a measure of how rough a surface is, e.g. how large particles it 
is composed of. 



fa 



i=0 



This is (inversely) a weighted sum of the deviations from the center pixels wrt 
the surrounding pixels. The small value e is to cope with division by 0. 

Contrast 

High contrast means the intensity difference between neighboring regions is 
large. 



^ Gh Gh 



NgiN, - 1) 



^ Gh 



i=0 



Q^ = 



1, ifft^O 

0, otherwise 



Where: Ng - the number of different gray-tones present in the image 

The first factor is used to reflect the dynamic range of gray scale weighted 
with the product of relative frequencies of the two gray-tone values under con- 
sideration. The second factor increases with the amount of local variation in 
intensity. 



Busyness 

A busy texture is one where the spatial frequency of intensity changes are high. 



Gh j Gh Gh 

i=0 I 1=0 i=i 



V. + 0, V, ^ 

The numerator is a measure of the spatial rate of change in intensity, inversely 
related to coarseness. The denominator is a summation of the magnitude of 
differences between the different gray-tone values. This formula differs slightly 
from the one described in the article by Amadasun and King[lj — I'm certain 
there's a typo in that formula making it always zero. 
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Complexity 

Complexity means high information content. This could mean many primitives 
or patches, especially if they have different average intensity. 

An elaborate description of this formula (and the others in this section) are 
found in the article by Amadasun and King[l[. 



Texture strength 

A strong texture is generally referred to as strong if its building blocks are easily 
definable and clearly visible. Such texture tend to look attractive. However a 
strong texture is difficult to define concisely [ij. It is defined as 

Gh Gh 

^^{Pi+Pj){i- if 

i=0 

Where the numerator is a factor stressing the differences between intensity lev- 
els, and therefore may reflect intensity differences between adjacent primitives. 
The probabilities p, tend to be high for large primitives. The denominator 
would be small for coarse texture and high for busy or fine textures consider- 
ing the definition in (|2.8p . 



2.3 Fast 2D Convolution 

Two-dimensional discrete convolution in the spatial domain is defined as 

oo 

(/*5)N = X! fVn]-9[n~m]. 

m— — oo 

By the Circular Convolution Theorem [i^ this can instead be done in the 
frequency domain considering 

•^{/ * .9} - • .^{5} (2.10) 

Where: * - the convolution operator 

~ the Fourier Transform (FT) 

First apply FT to image and to convolution kernel, then multiply the two 
matrices element-wise. To get the filtered image just apply inverse FT. 

For this to work the kernel has to be placed in a matrix the same size as 
the image, wrapped around the origin'^, which in FFTW is at position (0,0), 
like in figure [2T31 Also, there are border cases in the image, it has to be padded 
with wraparound pixels. [3] 

•^origin aka DC component, zero frequency 



2.4. Scaling data 
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Iinaec height X Imaec width matrix 



Example 5x5 kernel 



/ll 12 13 14 15\ 

21 22 23 24 25 

31 32 33 34 35 

41 42 43 44 45 

\51 52 53 54 55/ 
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Figure 2.4: How to make sure the kernel wraps around the origin in frequency 
space 



2.4 Scaling data 

Scahng is very important. If scaling is not applied to all features a feature with 
a larger numeric range may dominate others with smaller numeric range. 

range = max Xi — min Xi 

i i 

midrange = ^maxxi + minx^^ ^ 2 

Xi — midrange , „ 

range/2 ' ^ ° 

^ 0, range = 

Where: Xi - feature value of example i 

i e [o,e) c z+ 
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Material and Methods 



3.1 Material 

Blood samples were taken from four individuals. The cells were photographed 
on a CellaVision'^^ DM-96. The width of the images hes in the range [119, 267]. 
The height of the images lies in the range [119,258]. On average an image is 
about 139 X 139 pixels. This correspond to about 13.7 |J.m. 

The cells are normal, e.g. there are no cancer cells or malaria infected cells. 
There are very few (2) blast cells indicating the only possible cancer type would 
be lymphoma, i.e. a cancer in the lymph nodes. 

The cells were classified on the CellaVision"^'^ DM-96 and its result was 
taken as ground truth. The machine is 90% to 95% correct depending on the 
individual. The cell types of the data set are given in table 13.1] Typical relative 
frequencies of the cells are found in table ll.ll Typical images of some common 
cells are found in figure fTTTI 

From the set of images of the cells a range of descriptors, or features, were 
extracted. A set of features extracted from a single image, called instance or 
example, is denoted x and the space of all possible features is denoted X. 

A Support Vector Machine (SVM) was trained using the set of features 
described. 

3.2 Implementation details 
3.2.1 Support Vector Machine 

The SVM was written in C++ within the Boost C++ Libraries framework. 
The Gram matrix G, defined in ()2.4p . the output of the kernel function, is 
cached in memory to dramatically reduce running time. 

A Stochastic Gradient Ascent Variant 

Stochastic gradient ascent differs from ordinary gradient ascent in that the 
coefhcients updated are used right away, instead of in the next iteration. 
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Material and Methods 



Class No. 


Class Name 


1 


neutrophil granulocytes, segmented 


6 


neutrophil granulocytes, band 


2 


eosinophil granulocytes 


3 


basophil granulocytes 


4 


lymphocytes 


7 


lymphocytes, variants 


5 


monocytes 


9 


myelocytes 


1 n 




11 


blast, immature cell 


21 


artifacts 


24 


broken cell 


25 


thrombocytes (platelets) 


29 


clots of thrombocytes 



Table 3.1: Cell types classified in the data set 



In this project a variant of the stochastic gradient ascent method of training a 
SVM were implemented. 

The coefficients olkkt that invalidate the Karush-Kuhn- Tucker (KKT) 
conditions are selected first for update. They are likely the ones that will 
affect the solution most rapid. When these satisfies the KKT conditions, or 
when no progress has been made in some iterations, the greater problem of 
updating all coefficients a is considered. 



Multiclass SVM 



I use the one-against-the-rest methodllC 
similar precision to the latter two l3, 20 



because it is the simplest and it has 
The latter two are however faster to 



train because they can train all the classifiers at once.[ 



3.2.2 Features 
Scalable Color Descriptor 

In MPEG-7 the 3D color histogram bins are reduced in size by truncation and 
encoding (see l2.2.f1) . To release the SVM from this hassle it receives the values 
as ordinary real values representing the relative frequency of color channel 
values. The bounded time complexity to calculate this descriptor is 0{iWH). 



3.2. Implementation details 
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Color Structure Descriptor 

This is implemented by calculating a histogram for each structuring element 
and then summing over all structuring elements 



i=i j=i 

Where: m - bin index in the final histogram 
Cm - quantized color level 

J - histogram for structuring element {i,j) 

Calculating this descriptor is much more expensive than Scalable Color De- 
scriptor described in section [^.2. II 0{ ('"-sfcK^'-sfc) g2-^ each channel, this is 
more than a 30-fold increase on a 640 x 480 image compared to the above. 

Color Layout Descriptor 

The Discrete Cosine Transform of type DCT-II is calculated using the software 
library FFTW3 (Fastest Fourier Transform in the West). The zigzag scanning 
order described in figure 12.21 is implemented as an C-f -I- STL iterator using 
the simple algorithm presented in listing 13.11 A wider low pass band is used 
than in MPEG-7. The 10 first Y (6 in MPEG-7), the 5 first of U and V (3) 
coefficients are extracted. 

Homogeneous Texture Descriptor 

By symmetry the filter might as well be rotated instead of the image and since 
that is more efficient that is what is done. The bandwidth b is set to 1 octave 
by relation (j3.2p and setting a ^ 



In MPEG-7 rotation invariance in this descriptor is achieved by rotating the 
features in the direction of the dominant direction. This is not implemented in 
this project. 

In figure 13.11 images of the Gabor wavelet filter bank kernels of different 
orientations are presented. 

Neighborhood Gray- Tone Difference Matrix 

The A used in the Neighborhood Gray- Tone Difference Matrix (|2.7p can be 
divided into subproblems which do not need to be calculated every time. By 
keeping the center value (m, n) — (0, 0) in the sum (not writing out normaliza- 
tion) 



W-SK H-SK 
K K 




(3.1) 




(3.2) 



d d 

A'(k,l)= E f{k + m,l + n), 
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Material and Methods 



X = 0; y = 0; forward = true; 

value-type get_current () { return source(x,y); } 
void next () { 
if (forward) 

if (y < length-1) { 

y ++; X — ; 

if (x < 0) { 
X = 0; 

forward = false ; 

} 

} else 

if (y = length -1) { 
X ++; 

forward = false ; 

} 

else 

if (x < length-l) { 
X ++; y — ; 
if (y < 0) { 
y = 0; 

forward = true ; 

} 

} else 

if (x = length-1) { 

y ++; 

forward = true ; 

} 



Listing 3.1: Simplified source for the implemented zigzag order on a 
length X length square matrix 



it can also be written as 



A'{k,l) = < 



A'{k,l-1) + 



above 



^ f{k + m,l + d)- f{k + m,l-d-l) or 

A'{k-l,l) + 



as 



m=—d 



to the left 



f{k + d,l + n)- f{k-d-l,l + n). 



n=—d 



Given the value above or the value to the left the others can be calculated 
faster. 



3.2. Implementation details 
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(d) e = 108° (e) 9 = 144° (f) 6 = 180° 



Figure 3.1: Gabor Filter bank at scale = 5* — 1 at different orientations. Gray 
areas are the ones with zero magnitude, darker is negative, lighter is positive 



To find all A first fill in a table with all A', from left to right, top-down. 
Then for all positions remove the center value and make sure the accumulated 
value is correctly normalized. The time complexity is thereby reduced from 
0{cP) per pixel to 0{d) per pixel. 

3.2.3 Convolution 

Using the method for convolution described in section 12.31 is much more ef- 
ficient than the naive approach of doing the calculations in the spatial do- 
main. It reduces the complexity from 0{K^) per pixel, where K is the size 
of the convolution kernel, to 0(log7V), where the image is iV x A'^ in size and 
iV = 2*^ , fc G Z+ . The last requirement make sure that the much more efficient 
Fast Fourier Transform (FFT) can be used instead of a normal Discrete Fourier 
Transform (DFT). 

With the largest kernel used, K'^ = 91^ = 8281, and a 1000 x 1000 image, 
log 1000 ~ 6.9, a thousandfold speed-up can be achieved. 

These figures are however for FFT on matrices of size N ^2^. Padding to 
the next larger 2-power is not implemented since the software library used for 
FFT, called FFTW^ (Fastest Fourier Transform in the West) supports other 
sizes too and still provides great speed. 



^Heavily used library with an impressing architecture, used in e.g. Matlab 
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DataViewShjIlle 



□ataView Range 



+ DataViewRangeiview : DataView, lo : int, hi : int) 



□ataViewConcat 



+ DataViewC(}ncat(view1 : DataView, view2 : DataView) 



DataSelView 



EitampleView 



ArrayView 






1 





Data View 



+ getClassfi : kit) : int 

+ getFeature(i : int) : Fsattsre 

+ isBinafyO : baol 

+ sizeQ : tiint 

* pnitlnfioO ivoid 



DataViewC lass Map B i nary 



+ DataViewC lass WapBinary (view : DataView, positive_class : int) 



DataView C lassJo i n 



+ DataViswClissJoimvisw : DataVisw. classes : vsctQr-=rvsctor-::int»:i 





DalaViewC lassMap L i near 


1 






1 


DataView C lassRemove 








+ DataViewClassRemoveivksw : DataVieiv. class™ lint) 





Figure 3.2: Abstract class (interface) to data views and their realizations 



3.2.4 Data View 



The classifiers view data. Rather than giving them the data structure holding 
data directly an abstraction was built named DataView. The abstraction was 
realized in 11 classes which are found together with their base abstract class 
in figure 13.21 The derived classes can all be used transparently releasing the 
classifier and data set loader from the tasks of the views. 



3.2. Implementation details 
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These three views below contain pointers to the real data. 

DataSetView view of data represented by a DataSet instance 

ExampleView view of data represented by a vector of Example 
instances 

ArrayView view of data from an boost: : Array, convenient 
for the unit tests concerning views 



The views below contain other views and just map their values. They are 
often chained together to get the wanted view. 

DataViewScaled view the features as if they were in the range 
[—1, 1], avoids feature- wise bias, see section 

DataViewRange selected only a subset of the examples, used in e.g. 
cross-validation 

DataViewConcat view two views as if they were one, also used in 
cross-validation 

DataViewShuf f le shuffle the order of examples. It is of course not 
wanted to split an ordered set and train on the 
first part and test on the other, a class may then 
be present only in the latter 

DataViewClassMapLinear if e.g. only classes {0, 3, 42, . . .} exists it is conve- 
nient if they can be represented by {0, 1, . . .} 

DataViewClassMapBinary one class given is said to be positive, all other is 

said to be negative. Used in multiclass classifier 

DataViewClass Join join groups of classes into new classes 

DataViewClassRemove view with a class removed 
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Experimental Setup and Results 



4.1 Experimental Setup 

The CellaVision'^'^ DM-96 machine achieves an error rate of approx. 5-10% 
depending on individual. Thus there are errors in the ground truth. 
I have divided the problem in two parts. 

• the primary problem — the SVM should classify all classes present in the 
data set. 

• the simplified problem — some classes are merged and others are removed. 

4.1.1 Performance test method 

In both cases 2-fold cross-validation is used to test performance. This means 
that two models will be trained. In the first, half the data set is the training 
set and the other half is the test set. In the other, the roles of the subsets are 
swapped. This way both halves will act as both training and test sets. 

4.1.2 Description of tiie simplified problem 

Class 1 and 6, Neutrophil granulocytes, segmented and band variants are 
merged to form class 30. Even human experts have approx. 25% error rate on 
these. It is often more a matter of opinion than of objective decision. 

Class 4 and 7, Lymphocytes and their variants, are joined. The variants 
are rather uncommon, there are only 8 instances in the dataset, compared to 
160 of Lymphocytes. Due to the skew distribution these are merged to form 
class 31. 

The following classes are removed. Class are unidentified objects, it is a 
very heterogeneous group but there are only 6 of them. Class 21 arc artifacts, 
random garbage, they are removed. Class 24 are broken cells, there are only 
7 of them. Class 25 and 29 are thrombocytes and clots of them, i.e. platelets. 
Since they aren't even white blood cells they are removed. Class 11, called blast 
is a kind of immature cell which would be interesting to classify but there are 
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Experimental Setup and Results 



Class No. Class Name 

30 (1+6) neutrophil granulocytes 

2 eosinophil granulocytes 

3 basophil granulocytes 

31 (4+7) lymphocytes and variants 

5 monocytes 

Table 4.1: Cell types left in the simplified problem 



only two of them so they are removed as well. Class 9 and 10 are myelocytes 
and meta-myelocytes, which are a development stage of different granulocytes. 
There can be e.g. eosinophilic myelocytes and basophilic myelocytes. In the 
dataset they are also too rare to train a general classifier. There are only a 
total of 4 myelocytes in this heterogeneous group. All classes that are left are 
presented again in table 14.11 

4.2 Results 

4.2.1 Primary Problem 

The error rate in the primary problem is 10.8%. The type of kernel function 
that was the most successful was the Polynomial kernel. This is compared to 
the slightly better result using libSVM, 9.6%. See table li?! 

Most confusion occurs between classes 1 (segmented neutrophil granulo- 
cytes) and 6 (band neutrophil granulocytes). Much confusion is also present 
when recognizing class 3 (basophil granulocytes) — they are often (2 of their 
total of 7) misclassified as class 1 (segmented neutrophil granulocytes), which 
is a very large group. 

4.2.2 Simplified Problem 

In the simplified problem the error rate is 3.1%. Also in this problem the most 
successful kernel was the Polynomial kernel. This is compared to the better 
result using hbSVM, 2.3%. See tabled 

In the simplified problem most confusion (by number) occurs between class 
5 (monocytes) and the new class 31 (lymphocytes). By percentage the largest 
confusion occurs between class 30 (segmented and band neutrophil granulo- 
cytes) and class 3 (basophil granulocytes). Class 3 have only 8 instances of 
which 3 were misclassified as 30. 



4.2. Results 
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Implemented SVM Results 



Kernel Type Error Rate (%) Parameters 





Total 


Max 


MiN 






RBF with LP' norm 


11.5 


12.0 


11.1 




= 20 


RBF with norm 


11.5 


12.0 


11.1 




= 22 


Polynomial 


11.8 


12.5 


11.1 




: 2 


Polynomial 


11.1 


11.5 


10.6 


d = 


: 3 


Polynomial 


11.5 


12.0 


11.1 


d = 


= 4 


Polynomial 


10.8 


11.1 


10.6 


d = 


: 5 


Polynomial 


11.3 


12.0 


10.6 


d = 


: 6 



libSVM Results 



RBF 9.6 C = 512, 7-1 = 8192 

Table 4.2: SVM cell classifier results for the primary problem 



Number of Confusions 






Guessed Class 




Class 


(n) 


12 3 4 


5 6 


7 11 21 24 





(4) 








1 


(205) 




• 1 




2 


(14) 








3 


(7) 


. 2 • • • 






4 


(104) 






2 • 1 • 


5 


(32) 


. . . . 2 




1 . . . 


6 


(12) 


1 6 • • • 


1 • 




7 


(6) 


. . . . 1 


1 • 




11 


(1) 


. . . . 1 






21 


(31) 


. . 1 . . 






24 


(1) 






1 



Table 4.3: Confusion Matrix for the primary problem 
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Implemented SVM Results 



Kernel Type 


Error Rate (%) 




Parameters 




Total 


Max 


MiN 




RBF 


3.6 


4.4 


2.9 


C = 128,0-2 = 16 


RBF 


3.3 


4.2 


2.3 


C = 512, a'^ = 128 


Polynomial 


3.1 


3.4 


2.9 


d = 3 


Polynomial 


3.5 


3.9 


3.1 


d = 5 


libSVM results 


RBF 


2.5 






C = 8,7-1 = 128 


Polynomial 


2.3 






C = 8,7-1 = 128, (i = 3 


Polynomial 


3.5 






C = 8,7-1 = 128,rf= 5 



Table 4.4: SVM cell classifier results for the simplified problem 



Number of Confusions 





Guessed class 


Class 


in) 2 


3 5 30 31 


2 


(20) • 




3 


(8) • 


• • 3 


5 


(56) • 


• • • 2 


30 


(517) • 


• 1 


31 


(168) • 


• 7 • 



Table 4.5: Confusion matrix for the simplified problem 
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Discussion 



The accuracy achieved in the primary problem was 89.2% and in the sim- 
pUfied problem 96.9%. 1 regard these results as good when compared to 
CellaVision'^''^ DM-96's result of the primary problem, 90-95%. One has to 
consider that there are errors in the ground truth misleading the SVM. Thus, 
it is imccrtain whether the results arc better than the DM-96 or worse. Be- 
cause the DM-96 has an error rate of about 5-10% a 0% error rate in the 
primary problem would mean something like 5-10% error, while a 5% error 
could possibly mean 0-15% error. 

I conclude that using the combination of MPEG-7 descriptors and visual 
texture features in combination with SVM to classify cells is good but need 
further investigation to find out how good. A more comprehensive study could 
investigate whether a set of SVM or ANN variants perform better on the set 
of features implemented or on the set of features developed at CellaVision"'"'^ . 

1 would like to stress that using a SVM instead of an Artificial Neural 
Network as in the CellaVision"'"'^ DM-96 machine is more statistically rigor — 
Confidence intervals of the classifier can be found, which to my knowledge is 
impossible in ANNs. In medicine it is important to know the strength of the 
method used. 

It would be very interesting to test the features on the real training set they 

have developed at CellaVision'^^ . The company has a training set of thousands 
of cells classified by field experts. Some cell images required five experts to be 
certain of the cell type. Without the errors in the ground truth the results 
could possibly compete with the CellaVision'^'^machine. 

The result of the primary problem states that the most confused instances 
are those that are guessed to be a segmented neutrophil (class 1) but that 
are a band neutrophil (class 6) in the ground truth. These two often look 
very similar, humans often have different opinions about which class cells are. 
Also the CcllaVision"^'^^ DM-96 have problems with these classes indicating 
that there are several errors in the ground truth. The errors in ground truth 
probably mislead the SVM. There are only 12 cell images of class 6 of which 
some have the wrong class and there are 205 images of class 1, of which not all 
are truly class 1. This situation pushes bias to the larger class. 
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Discussion 



In the result of the simplified problem most confusion occur between the 
monocytes (class 5) and the lymphocytes (class 31). This was expected as they 
are very hard to classify for both humans and the CellaVision'^^ DM-96. Late 
in writing this thesis I discovered that there is great discrepancy in size of these 
two types of cells. The discrepancy indicates that the size of the cells could be 
used as a feature too. 

Practical Use 

Even the simplified problem would give useful information when applied to 
medicine. Standard measures used in diagnosis involve counting the total num- 
ber of white blood cells, leukocytes, determining the distribution of lympho- 
cytes and granulocytes and determining the number of monocytes. 

Malaria infected and cancer cells look different compared to healthy blood 
cells. It would be interesting to test the features on these kind of cells to be 
able to classify them as well. 

Runtime Performance 

To increase cache performance in the Color Structure Descriptor fsection r2.2.2p 
it would be wise to first extract all sub-samples i.e. the representative color for 
each K X K area as the other pixels aren't used. They will otherwise quickly fill 
up the cache during memory pre-fetch. Now, the sub-samples are viewed using 
a sub sampling view present in Generic Image Library (boost: :gil), contributed 
to Boost by Adobe. The views in GIL are virtual, meaning they only keep 
information about offset calculations — no data is duplicated. 

The 2D convolution was first done in the spatial domain but I soon realized 
it was way to slow with my bigger Gabor filter kernels of which the largest 
are 91^ pixels big. Instead the calculations are done in the frequency domain 
which is much faster, see sections 12.31 and (3.2.31 

To improve performance of the Gabor Wavelet Filter further the kernels 
should of course be kept in memory when generating features of many images, 
however they are not. 

To improve SVM training performance the Gradient Ascent training algo- 
rithm must be replaced or at least improved. The algorithm implemented 
divide the problem into a subproblem where the coefficients violating the 
KKT conditions are first optimized. This is a heuristic called chunking in 
the literature^. By using this, fewer elements of the Gram matrix, and their 
corresponding support vectors, need to be kept in memory. This is something I 
don't take advantage of because I had enough memory for my purposes. By re- 
fining chunking into decomposition where a fixed size chunk is optimized, more 
data points can be used and convergence speed is increased. The Sequential 
Minimization Optimization (SMO) takes decomposition to the extreme and 
optimizes only two coefficients at a time and can thereby make sure that the 
KKT condition, '^iVi — ^^ always true. LibSVM uses a variant of this 

approach and it offer great performance. 



Discussion 
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Beyond Gabor Filters 

If modeling human brains is tiie objective, considering other approaches than 
the Gabor wavelet would be interesting. A type of neurons in the first vi- 
sual cortex, called simple cells, have been recorded from monkey and cat. 
The recordings and the elaborate analytical discussion in an article by Wal- 
lis show that both difference of Gaussianx Gaussian (DoGG) and Cauchy 
functions model cortical cells better than Gabor wavelets for the measured 



parameters. [2l| In an article by Ashour et al. three other types of transforms 
are suggested — ridgelets, curvelets and contourlets. Q Perhaps they can show 
increased performance. 
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Appendix A 



Software Usage 



The software produced in this project can be found at 

• http : //tobbe .nu/pub/2008/ cell .morph.mpegT . svm/ 

The software has only been tested on an Ubuntu Linux system. However, 
the software is written in portable C99 C++ and should work on all *nix 
platforms that can supply the dependencies, perhaps even under cygwin under 
MS Windows. The dependencies are 

• C99 comphant C++ compiler (GNU g++ tested) 

• Boost C++ Libraries, http://www.boost.org/ 

• FFTW3 (Fastest Fourier Transform in the West 3) , http : //www . f f tw . org/ 

• GSL (GNU Scientific Library), http://www.gnu.org/software/gsl/ 

• libjpeg 

• libpng 

Below is a brief overview on how to use the most important programs in 
the software package. There are other programs in the package but they are 
mostly related to testing. 
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Software Usage 



A.l train — Train a model 



This is the program where most processing is done. It can 

• train a model from a dataset 

• test a model with a dataset 

• load and/or save a model from/to a file 

• perform cross-validation 

Here is the syntax of the program train 



MAIN 
MAIN_HELP 
MAIM_DO 

MODE 
LOAD_MODEL 
XVALIDATION 
N_FOLDS 
DATASET 
MODEL _PARAMS 

KERN 
KERN_LIST 
KERN_TYPE 
KERN_PARAM 
GAP_TOL 
TERM 
BITMASK 
SAVE_MODEL 



= (MAIN_HELP I MAIN_DO) 

= ./train [-h] 

= ./train MODE DATASET 

MGDEL_PARAMS SAVE_MODEL 
= LOAD_MODEL XVALIDATION 
= -1 MODEL. model 
= -f N_FOLDS 
= 1 I INTEGER 
= -d INTEGER 
= -k KERN -p KERN_PARAM 

-C DOUBLE -g GAP_TOL -m TERM 
= KERN_LIST I KERN_TYPE 
= 

= 1|2|3|4|5|6 
= DOUBLE 
= DOUBLE 
= BITMASK 
= 112 13 
= -o MODEL. model 
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Both cross-validation and saving of a model can be performed at the same 
time if wanted. However, this will mean that train will create one model for 
each fold but it is just the last one that will be saved. If cross-validation is 
not wanted pass one (-f 1) fold. The double precision floating point number 
passed with -C is a number used in the classifier, it is related to the KKT 
conditions. The gap tolerance is also a double precision floating point number 
which is used as a convergence criterion. It is the allowed gap between the 
primal and dual objective function, the feasibility gap, which should be a small 
number. The default gap is set to 10"'^. The m terminator is a bit-mask which 
control when a classifier is considered optimal, i.e. when training will stop. 
The feasibility gap constraint is not used if -m 2 is passed, i.e. when the first 
bit (1) is zero. The primary training terminator bit is 2 which means that 
all KKT conditions must be satisfied to terminate training. The default of 3 
means that both these conditions must be satisfied. 
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A. 2 cellf eatures — Generate examples from the cell 
database 

To generate features from all pairs of (image,ground truth class) in the cell 
database the program cellf eatures is used. The file cellf eatures. data is 
backed up before writing the features generated to it. This file can be used by 
the program train. 

MAIN ::= (MAIN_HELP | MAIN_DO) 
MAIM_HELP ::= . /cellf eatures 
MAIN_DO ::= . /cellf eatures DB 



A. 3 jpeg_genf eature — Feature generation from images 

To genciratc! a set of features from imagc(s) the program called jpeg_genf eature 
is used. It generate a set of features that can be classified later with predict. 

MAIN ::= (MAIN_HELP | MAIN_DO) 

MAIN_HELP ::= . /jpeg_genf eature -? 

MAIN_DO ::= . /jpeg_genf eature CROPIMAGE* -o FEATURESET.f eat 

CROPIMAGE ::= -i IMAGE. jpeg [-x left -y top -w width -h height] 



A. 4 predict — Predicting a set of features 

To predict a set of features, generated by jpeg_genf eature, the program called 
predict is used. It needs a previously trained model generated by train. 

MAIN : : = (MAIN_HELP | MAIN_DO) 
MAIN_DO ::= ./predict -1 MODEL. model -f FEATURESET.f eat 
MAIN_HELP ::= ./predict -? 



A. 5 extractcelltype - Extract a class of images from 
the cell database 

To extract a specific class (as classified by CellaVision'^^ DM-96) from the cell 
database, the program called extractcelltype is used. 



MAIN 
MAIN_HELP 
MAIN_DO 
CLASS 

DB 

ALLXMLFILES 



(MAIN_HELP I MAIN_DO) 
. /extractcelltype 
./extractcelltype CLASS DB 
INTEGER 

ALLXMLFILES I (XMLFILE ' ')* 
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A. 6 extractcellid — Extract given instances from the 
cell database 

To extract given instances from a list of id numbers, the program called extractcellid 
is used. 



MAIN 

MAIN_HELP 
MAIN_DO 
IDLIST 



= (MAIN_HELP I MAIN_DO) 
= ./extractcellid 
= ./extractcellid IDLIST DB 
= (INTEGER ' ')* 'x' 



A. 7 extractcellinf — Extract statistics of instances 
from the cell database 



To extract statistics about size, resolution and number of instances of a specific 
class or of all classes the program called extractcellinf o is used. 



MAIN 
MAIN_HELP 
MAIN_DO 
CLASS 
CLASS_ALL 



= (MAIN_HELP I MAIN_DO) 

= . /extractcellinf o 

= . /extractcellinf o CLASS DB 

= CLASS_ALL I CLASS 

= '-1' 



A. 8 tolibsvm — Save cell features in libSVM format 

This program load the features saved in cellfeatures.data and dump them 
in libSVM format on standard output. It takes no parameters. 

./tolibsvm > cellfeatures.data.libsvm 



This figure "DataView.png" is available in "png" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "kernel_scale4_orient0.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "kernel_scale4_orientl.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "kernel_scale4_orient2.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "kernel_scale4_orient3.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "kernel_scale4_orient4.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "kernel_scale4_orient5.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_29-id_15018.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_5-id_15130.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_l-id_19592.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_4-id_19595.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_3-id_19702.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_2-id_20003.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_6-id_20103.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



This figure "extracted-class_2-id_20106.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0812.2309vl 



